NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

The FineView Dataset: A 3D Scanned Multi-View Object Dataset of Fine-Grained Category Instances

https://doi.org/10.1109/WACV61041.2025.00549

Onda, Suguru; Farrell, Ryan (February 2025, IEEE)

Nature and wildlife observation is the practice of notign both the occurrence and abundance of plant or animal species at a specific location and time. Common exam-ples of this type of activity are bird watching (birding), insect collecting, and plant observation (botanizing), and these are widely accepted as both recreational and scien-tific activities in their respective fields. However, many highly-similar species are difficult to disambiguate; identi-fying an observed specimen requires expert knowledge and experience in many cases. This hard problem is called Fine-grained Visual Categorization (FGVC) and focuses on dif-ferentiating between hard-to-distinguish object classes. Ex-amples of such fine-level classification include discriminatign between similar species of plants and animals or iden-tifying the make and model of vehicles, instead of recognizing these objects at a coarse level. An FGVC example of butterflies is shown in Figure 1. These two species have similar colors and shapes, but the patterns on the wings are distinct. When presented with near-identical poses as in the figure, this classification can be performed very effectively by a machine. However, in more extreme conditions of pose, illumination, occlusion, etc, the task becomes much harder. While machines struggle in such scenarios, humans can still find the needed visual cues and differences by fac-toring in the pose of the butterfly and comparing patterns on common parts; in part, because humans can infer an ob-ject's rough 3D shape, understand the lighting and camera angle, and even envision what it would look like from an-other pose. Humans have developed a 3D understanding of a butterfly because we have seen moving butterflies previ-ously. What if machines had the same information about the object? Information such as object pose, camera angle, object texture, and part labels, would undoubtedly help im-prove performance on the FGVC task.
more » « less
Free, publicly-accessible full text available February 26, 2026
Elusive Images: Beyond Coarse Analysis for Fine-Grained Recognition

https://doi.org/10.1109/WACV57701.2024.00088

Anderson, Connor; Gwilliam, Matt; Gaskin, Evelyn; Farrell, Ryan (January 2024, IEEE)

While the community has seen many advances in recent years to address the challenging problem of Fine-grained Visual Categorization (FGVC), progress seems to be slowing—new state-of-the-art methods often distinguish themselves by improving top-1 accuracy by mere tenths of a percent. However, across all of the now-standard FGVC datasets, there remain sizeable portions of the test data that none of the current state-of-the-art (SOTA) models can successfully predict. This paper provides a framework for identifying and studying the errors that current methods make across diverse fine-grained datasets. Three models of difficulty—Prediction Overlap, Prediction Rank and Pair-wise Class Confusion—are employed to highlight the most challenging sets of images and classes. Extensive experiments apply a range of standard and SOTA methods, evaluating them on multiple FGVC domains and datasets. Insights acquired from coupling these difficulty paradigms with the careful analysis of experimental results suggest crucial areas for future FGVC research, focusing critically on the set of elusive images that none of the current models can correctly classify. Code is available at catalys1.github.io/elusive-images-fgvc.
more » « less
Full Text Available
Semantic Network Interpretation

https://doi.org/10.1109/WACVW54805.2022.00046

Guo, Pei; Farrell, Ryan (January 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW))

Network interpretation as an effort to reveal the features learned by a network remains largely visualization-based. In this paper, our goal is to tackle semantic network interpretation at both filter and decision level. For filter-level interpretation, we represent the concepts a filter encodes with a probability distribution of visual attributes. The decision-level interpretation is achieved by textual summarization that generates an explanatory sentence containing clues behind a network’s decision. A Bayesian inference algorithm is proposed to automatically associate filters and network decisions with visual attributes. Human study confirms that the semantic interpretation is a beneficial alternative or complement to visualization methods. We demonstrate the crucial role that semantic network interpretation can play in understanding a network’s failure patterns. More importantly, semantic network interpretation enables a better understanding of the correlation between a model’s performance and its distribution metrics like filter selectivity and concept sparseness.
more » « less
Full Text Available
Improving Fractal Pre-training

https://doi.org/10.1109/WACV51458.2022.00247

Anderson, Connor; Farrell, Ryan (January 2022, 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV))

The deep neural networks used in modern computer vision systems require enormous image datasets to train them. These carefully-curated datasets typically have a million or more images, across a thousand or more distinct categories. The process of creating and curating such a dataset is a monumental undertaking, demanding extensive effort and labelling expense and necessitating careful navigation of technical and social issues such as label accuracy, copyright ownership, and content bias.What if we had a way to harness the power of large image datasets but with few or none of the major issues and concerns currently faced? This paper extends the recent work of Kataoka et al. [15], proposing an improved pre-training dataset based on dynamically-generated fractal images. Challenging issues with large-scale image datasets become points of elegance for fractal pre-training: perfect label accuracy at zero cost; no need to store/transmit large image archives; no privacy/demographic bias/concerns of inappropriate content, as no humans are pictured; limitless supply and diversity of images; and the images are free/open-source. Perhaps surprisingly, avoiding these difficulties imposes only a small penalty in performance. Leveraging a newly-proposed pre-training task—multi-instance prediction—our experiments demonstrate that fine-tuning a network pre-trained using fractals attains 92.7-98.1% of the accuracy of an ImageNet pre-trained network. Our code is publicly available. 1
more » « less
Full Text Available
Fair Comparison: Quantifying Variance in Results for Fine-grained Visual Categorization

https://doi.org/10.1109/WACV48630.2021.00335

Gwilliam, Matthew; Teuscher, Adam; Anderson, Connor; Farrell, Ryan (January 2021, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV))

For the task of image classification, researchers work arduously to develop the next state-of-the-art (SOTA) model, each bench-marking their own performance against that of their predecessors and of their peers. Unfortunately, the metric used most frequently to describe a model’s performance, average categorization accuracy, is often used in isolation. As the number of classes increases, such as in fine-grained visual categorization (FGVC), the amount of information conveyed by average accuracy alone dwindles. While its most glaring weakness is its failure to describe the model’s performance on a class-by-class basis, average accuracy also fails to describe how performance may vary from one trained model of the same architecture, on the same dataset, to another (both averaged across all categories and at the per-class level). We first demonstrate the magnitude of these variations across models and across class distributions based on attributes of the data, comparing results on different visual domains and different per-class image distributions, including long-tailed distributions and few-shot subsets. We then analyze the impact various FGVC methods have on overall and per-class variance. From this analysis, we both highlight the importance of reporting and comparing methods based on information beyond overall accuracy, as well as point out techniques that mitigate variance in FGVC results.
more » « less
Full Text Available
Have Fun Storming the Castle(s)!

https://doi.org/10.1109/WACV48630.2021.00375

Anderson, Connor; Teuscher, Adam; Anderson, Elizabeth; Larsen, Alysia; Shirley, Josh; Farrell, Ryan (January 2021, IEEE Winter Conference on Applications of Computer Vision (WACV))

In recent years, large-scale datasets, each typically tailored to a particular problem, have become a critical factor towards fueling rapid progress in the field of computer vision. This paper describes a valuable new dataset that should accelerate research efforts on problems such as fine-grained classification, instance recognition and retrieval, and geolocalization. The dataset, comprised of more than 2400 individual castles, palaces and fortresses from more than 90 countries, contains more than 770K images in total. This paper details the dataset's construction process, the characteristics including annotations such as location (geotagged latlong and country label), construction date, Google Maps link and estimated per-class and per-image difficulty. An experimental section provides baseline experiments for important vision tasks including classification, instance retrieval and geolocalization (estimating global location from an image's visual appearance). The dataset is publicly available at vision.cs.byu.edu/castles.
more » « less
Full Text Available
Intelligent Image Collection: Building the Optimal Dataset

https://doi.org/10.1109/WACV45572.2020.9093292

Gwilliam, Matthew; Farrell, Ryan (March 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV))

Key recognition tasks such as fine-grained visual categorization (FGVC) have benefited from increasing attention among computer vision researchers. The development and evaluation of new approaches relies heavily on benchmark datasets; such datasets are generally built primarily with categories that have images readily available, omitting categories with insufficient data. This paper takes a step back and rethinks dataset construction, focusing on intelligent image collection driven by: (i) the inclusion of all desired categories, and, (ii) the recognition performance on those categories. Based on a small, author-provided initial dataset, the proposed system recommends which categories the authors should prioritize collecting additional images for, with the intent of optimizing overall categorization accuracy. We show that mock datasets built using this method outperform datasets built without such a guiding framework. Additional experiments give prospective dataset creators intuition into how, based on their circumstances and goals, a dataset should be constructed.
more » « less
Full Text Available
Aligned to the Object, Not to the Image: A Unified Pose-Aligned Representation for Fine-Grained Recognition

https://doi.org/10.1109/WACV.2019.00204

Guo, Pei; Farrell, Ryan (January 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV))

Dramatic appearance variation due to pose constitutes a great challenge in fine-grained recognition, one which recent methods using attention mechanisms or second-order statistics fail to adequately address. Modern CNNs typically lack an explicit understanding of object pose and are instead confused by entangled pose and appearance. In this paper, we propose a unified object representation built from pose-aligned regions of varied spatial sizes. Rather than representing an object by regions aligned to image axes, the proposed representation characterizes appearance relative to the object's pose using pose-aligned patches whose features are robust to variations in pose, scale and viewing angle. We propose an algorithm that performs pose estimation and forms the unified object representation as the concatenation of pose-aligned region features, which is then fed into a classification network. The proposed algorithm attains state-of-the-art results on two fine-grained datasets, notably 89.2% on the widely-used CUB-200 dataset and 87.9% on the much larger NABirds dataset. Our success relative to competing methods shows the critical importance of disentangling pose and appearance for continued progress in fine-grained recognition.
more » « less
Full Text Available
Pairwise Confusion for Fine-Grained Visual Classification

https://doi.org/10.1007/978-3-030-01258-8_5

Dubey, Abhimanyu; Gupta, Otkrist; Guo, Pei; Raskar, Ramesh; Farrell, Ryan; Naik, Nikhil (September 2018, European Conference on Computer Vision)

Fine-Grained Visual Classification (FGVC) datasets contain small sample sizes, along with significant intra-class variation and inter-class similarity. While prior work has addressed intra-class variation using localization and segmentation techniques, inter-class similarity may also affect feature learning and reduce classification performance. In this work, we address this problem using a novel optimization procedure for the end-to-end neural network training on FGVC tasks. Our procedure, called Pairwise Confusion (PC) reduces overfitting by intentionally introducing confusion in the activations. With PC regularization, we obtain state-of-the-art performance on six of the most widely-used FGVC datasets and demonstrate improved localization ability. PC is easy to implement, does not need excessive hyperparameter tuning during training, and does not add significant overhead during test time.
more » « less
Full Text Available

Search for: All records